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Abstract. This paper presents results of several imbalanced learning techniques applied to operator functional state assessment 
where the data is highly imbalanced, i.e., some function states (majority classes) have much more training samples than other 
states (minority classes). Conventional machine learning techniques usually tend to classify ail data samples into majority classes 
and perform poorly for minority classes. In this study, we implemented five imbalanced learning techniques, including random under- 
sampling, random over-sampling, synthetic minority over-sampling technique (SMOTE), borderline-SMOTE and adaptive synthetic 
sampling (ADASYN) to solve this problem. Experimental results on a benchmark driving test dataset show that accuracies for 
minority classes could be improved dramatically with a cost of slight performance degradations for majority classes. 


1.0 INTRODUCTION 

An Operator Functional State (OFS) refers 
to a multidimensional pattern of the human 
psychophysiological condition that mediates 
performance in relation to physiological and 
psychological costs [1], Accurate OFS 
assessment for human operators plays 
critical roles in automated aviation systems 
because it can ensure mission success and 
improve mission performances [2], 

Researchers proposed various modeling 
tools to assess OFS. In Ref, [3], a step\wise 
discriminate analysis (SWDA) method and 
artificial neural networks (ANN) were 
proposed to perform OFS assessment. As a 
nonlinear model, the ANN is considered 
more advantageous in complex task 
situations, especially if multiple features are 
used. In Ref. [2], committee machines 
proved useful in improving the assessment 
accuracy. Errors of individual committee 
members can be canceled if the errors are 
independent. Therefore, improvement can 
be achieved if individual members have low 
biases and are less correlation i.e., they are 
diversified [4]. In addition to the traditional 
“bagging” technique, which generates 
multiple versions of prediction based on the 
bootstrap technique to produce the final 
prediction [5], performing a feature selection 


procedure before training can further reduce 
correlations among committee members [2]. 

To successfully perform OFS assessment, 
however, researchers often face the 
challenge of modeling imbalanced datasets 
where datasets are not balanced, i.e., some 
OFS states have much more data samples 
than others do. In the machine learning 
community, those OFSs having more data 
samples than others are named ’majority' 
classes while those having less samples are 
called 'minority' classes. Traditional 
classifiers tend to classify all data samples 
into majority classes, resulting in poor 
performances for minority classes [6], which 
is not acceptable for OFS assessment. 

Many imbalanced learning techniques have 
been proposed to balance performances 
among majority and minority classes. Those 
techniques could be divided into four 
categories [6]: sampling methods, cost- 
sensitive methods, kernel-based methods, 
and active learning methods. Sampling 
methods aim to reduce the imbalance by 
removing (under-sampling) samples from 
majority classes or generating (over- 
sampling) more training samples for 
minority classes [7]. Cost-sensitive methods 
improve classification performance by using 
different cost matrices to compensate for 
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imbalanced classes [8]. Kernel based 
methods, such as the support vector 
machine (SVM), are based on the principles 
of statistical learning and Vapnik- 
Chervonenkis (VC) dimensions [9], Active 
learning is a type of iterative supervised 
learning technique, which is used in 
situations where unlabeled data is 
abundant. Active learning is often integrated 
into kernel-based learning methods by 
selecting the closest instance to the current 
hyper plane from the unseen training data 
and adding it to the training set to retrain the 
model [10], 

We have developed an OFS assessment 
strategy based on a committee machine for 
a closed-loop adaptive task manage 
system, where the OFS assessment was 
treated as a regression problem [2]. In this 
paper, we redesigned a similar model for 
the same task; however, we treated the 
OFS assessment as a classification 
problem. Because the data sets are highly 
imbalanced, traditional classifiers failed to 
classify minority states. We implemented 
several imbalanced techniques to improve 
classification performances for those 
minority OFS states. 

The remainder of the paper is organized as 
follows: Section 2 describes several 

imbalanced learning techniques 
implemented in this paper. Section 3 
presents the architecture of a committee 
classifier. Section 4 illustrates our 
experimental design, including 
implementation of the imbalanced learning 
techniques and the design of a committee 
classifier. Section 5 shows our achieved 
experimental results. Section 6 provides 
discussions for the results and Section 7 
concludes the paper. 

2.0 IMBALANCED LEARNING 
TECHNIQUES 

There exist many imbalanced learning 
techniques in the literature as described in 
the excellent review paper [6]. In our study, 
we implemented five of them as described 
below. 


• Random under-sampling 

• Random over-sampling 

• Synthetic minority over-sampling 
technique (SMOTE) 

• Borderline-SMOTE 

• Adaptive synthetic sampling (ADASYN) 

All the methods have been detailed in the 
Ref. [6], including their implementations, 
performances and limitations. The overall 
goal of those methods is to make data 
samples balanced among classes by 
dropping some data samples from majority 
classes and adding samples to minority 
classes, and to keep roughly the equal 
number of data samples for all classes. 

2.1 Random under-sampling 

Random under-sampling was only applied 
to majority classes. The method randomly 
selects a number of majority data samples 
to keep. This method may loss information 
in the majority classes. 

2.2 Random over-sampling 

The random over-sampling method was 
only utilized to minority classes. In contrary 
to the random under-sampling technique, 
this method randomly selects data samples 
from minority classes and duplicates them 
till the data set is roughly balanced. This 
method may lead to overfitting because 
data samples are repeatedly used. 

2.3 SMOTE 

To overcome the overfitting defect of the 
random over-sampling method, SMOTE 
generates or synthesizes new samples for 
minority classes. To create a new synthetic 
sample for a given data point (seed) from 
minority classes, it first randomly selects 
one of its K-nearest minority neighbors (K is 
specified by researchers arbitrarily). Then, a 
random point that is on the line between the 
seed and the selected neighbor will be 
synthesized as a new data sample. SMOTE 
may lead to the problem of over 
generalization [12]. The following methods, 
Borderline-SMOTE and ADASYN, are 
developed to overcome this limitation. 
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2.4 Borderline-SMOTE 

Borderline-SMOTE and SMOTE differ in the 
ways they select seeds. SMOTE may select 
any minority sample as a seed while 
Borderline-SMOTE only considers those 
who are from minority classes and are on 
the borderline between minority and 

majority classes. A minority class sample is 
considered as on the borderline if majority 
of its M nearest samples belong to majority 
classes (M is specified by researchers 
arbitrarily), 

2.5 ADASYN 

The difference between ADASYN and 

SMOTE is the amount of new data samples 
to be synthesized for each seed. SMOTE 
generates the same number of data 

samples for each seed while ADASYN 
syntheses data samples according to the 
distribution of seeds. Considering K nearest 
neighbors of a seed, the more belonging to 
majority classes, the more new samples will 
be synthesized for the seed. 

3.0 COMMITTEE MACHINE 

A committee machine is an ensemble of 
multiple estimators {committee members), 
which could be any learning method for 
classification or regression. The output of a 
committee machine is fusion of the outputs 
from all of its members. A theoretic 
interpretation for the principle of committee 
machine is that the errors from individual 
committee members can be canceled to 
some extent if they are uncorrelated. 

Research results show that the performance 
improvement can be affected by two factors: 
accuracies of individual committee 
members and correlations among them [4]. 
For the first factor, selection of an 
appropriate individual model is essential, 
because a better performance will usually 
be achieved if each of the individual 
members performs well. For the second 
factor, several techniques like bagging, 
boosting, averaging or voting, mixture of 
experts have proved effective [4]. In this 
paper, we use the following techniques to 
build the committee machine. 


• Use the bootstrapping technique to 
generate multiple 'copies' of the training 
data. 

• Apply an advanced feature selection 
algorithm. Piecewise Linear Orthogonal 
Floating Search (PLOFS) [11], to 
diversify the committee members such 
that their performances are not highly 
correlated. 

• Train a Multi-Layer Perceptron (MLP) by 
the standard Back Propagation (BP) 
algorithm as a base classification model 

• Delete the committee members having 
high biases (accuracy < 50%). 

• Utilize the majority vote scheme to fuse 
decisions from committee members. For 
example, if majority of the 15 total 
committee members predict class 1, the 
final output of the committee is class 1. 


The system diagram of the committee 
machine is shown in Figure. 1. 



Figure 1: Diagram of the Committee Machine 


4.0 EXPERIMENT DESIGN 

4.1 The driving test dataset 

We utilized a driving test dataset to validate 
our proposed method for OFS assessment. 
The dataset was collected by participants 
performing a driving test over the course of 
two hours. The collected information 
includes description of the driving task, 
system dynamics related information, 
performance measures, physiological 
signals (128-channel EEG, EGG, 
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respiration, etc.), and eye tracking. The 
workload was also analyzed according to 
the driving conditions (city-driving, stopped, 
highway passing, etc.), and seven OFSs, 
which indicate seven workload levels, were 
defined. 

Six subjects participated in the driving test 
and data was recorded in a separate file for 
each participant, resulting in six individual 
datasets. Each dataset has seven operator 
functional states (workload) that are 
considered as seven classes by our 
committee classifier. In the dataset, the 
number of data samples in each class is not 
balanced. Four classes (minority class) 
have much less data samples than other 
three majority classes do. Table 1 and 
Figure 2 show data distributions for all 
classes. 

Data distributions are similar for all subjects. 
Class 2 has the largest number of samples 
(about 35% of the whole data). Class 3 and 
4 have the second largest number of 
samples (about 20%). Therefore, around 
75% of samples belong to those three 
classes. Class 7 has the smallest number of 
samples accounting for less than 1% of the 
whole data, and subjects 2, 4 and 6 even 
have no data for class 7. Class 6 is the 
second smallest class having about 3% of 
the whole data samples. Both class 1 and 5 
account for 5% of the data samples. 

4.2 Imbalanced learning techniques 

To implement the five imbalanced 
techniques, we first compute a desired 
percentage of data samples per class as, 

Na = 100 / no. of classes * 100% 

= 100/7 *100% = 14.29% 

We then calculate a high threshold (Th) and 
a low threshold (TJ for the number of data 
samples in each class as, 

Th =Na*{^ + 0^) 

= 14.29% *17 = 15.71% 

Tl =/Vd* (1-0.1) 

= 14.29% *0.9= 12.86% 


Class 

Data 
set 1 

(»o) 

13ata 
set 2 
(»,) 

Data 
set 3 
(®.) 

Data 
set 4 
(».) 

13ata 
set 5 

(%) 

Data 
set 6 
(»..) 

1 

6.17 

8.70 

6.29 

5.59 

3.52 

3.86 

2 

38.34 

39.24 

33.83 

39.66 

32.65 

39.87 

3 

19.% 

21.42 

24.56 

32.94 

26.39 

20.16 

4 

23.55 

19.40 

21.07 

16.43 

31.24 

27.05 

5 

8.03 

8.25 

11.30 

3.03 

2.99 

6.10 

6 

3.89 

2.98 

2.67 

2.35 

2.99 

2.% 

7 

0.06 

0.00 

0.28 

0.00 

0.22 

0.00 


Table 1 : Data Distribution among Classes 
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Figure 2: Data Distribution among Classes 


Classes having data samples more than Th 
are considered as majority classes while 
classes with data samples less than Tl are 
considered as minority classes and others 
are treated as medium classes. 

As such, there are seven classes and N^, Tl 
and Th are 14.29%, 12.86% and 15.71%, 
respectively. Referring to Table 1, it is clear 
that classes 2, 3 and 4 are majority classes. 
Class 1 , 5, 6 and 7 are minority classes and 
there is no medium class in our datasets. In 
order to achieve a balanced dataset, the 
data portions in both majority and minority 
classes are made roughly the same as A/<j. 
We apply the random under-sampling 
technique to the majority classes and four 
over-sampling methods to the minority 
classes, resulting in four balanced datasets 
as shown in Figure 3. For each participant, 
the balanced dataset shares the majority 
classes’ data samples but has different data 
samples from minority classes, depending 
upon which oversampling method is used. 
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Figure 3: Generation of Balanced Datasets 

4.3 Committee classifier 

The committee classifier consists of a 
bootstrap procedure, a feature selection 
process and a majority voting scheme (see 
Figure. 4). A MLP trained by the BP 
algorithm was implemented as the base 
classification model. Basic procedures 
performed by the committee classifier are 
as follows: 

1 . Randomly divide a subject’s dataset into 
two parts with equal number of data 
points, one for training and another for 
testing. 

2. Generate M bootstrapped datasets for 
the training dataset. 

3. Apply one of the imbalanced learning 
techniques to the bootstrapped 
datasets. A balanced dataset is then 
obtained for each of the M datasets. 


accuracies greater than 50% are used 
only. Repeat the above procedures by 
exchanging the role of training and 
testing datasets. 

8. Repeat the above steps for each of the 
imbalanced learning techniques 
described in Section 3. 



4. Select a set of most effective features 
for each of the balanced datasets using 
the PLOFS algorithm. Selected features 
for different datasets maybe different. 

5. Train a MLP classifier for each of the 
datasets using the features selected for 
that dataset. 

6. Apply the trained MLP to the training 
and testing datasets. 

7. Generate the final classification result by 
majority voting. MLPs having training 


Figure 4: Design of the Committee Classifier 
5.0 RESULTS 

We trained a committee classifier for each 
of the six participants (datasets) and results 
are shown in T ables 2 - 7 and Figs. 5-10. 

In the Tables, the ‘Untreated’ column 
illustrates results achieved on the original 
data sets. Other four columns present 
accuracies (in percentage) for each class 
achieved by applying the four imbalanced 
learning techniques to the minority classes. 
The last row shows the average (overall) 
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accuracies achieved by each of the 
techniques. 

6.0 DISCUSSION 

It is observed that the classification 
accuracies are highly imbalanced if no 
imbalanced learning technique is used. For 
instances, the minority class 7 always has 
0% accuracy for all subjects but good 
performances are usually achieved for 
majority classes 2, 3 and 4. Classification 
accuracies have been balanced among 
minority and majority classes by applying 
the four imbalanced learning techniques to 
minority classes. Accuracies often have 
been significantly improved for minority 
classes while those for majority classes 
have been decreased slightly. As a result, 
the overall performance has been slightly 
degraded. Note that different sampling 
algorithms appear to perform similarly, 
indicating the robustness of the imbalanced 
learning techniques. 


Table 3: Results for Dataset 2 
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Figure 6: Results for Dataset 2 



Table 2: Results for Dataset 1 
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Figure 5: Results for Dataset 1 


Table 4: Results for Dataset 3 


Class 

In 

treated 

Over 

Sample 

Smote 

Border 

.AdaSyn 

1 

89.83% 

93.22% 

94.92% 

95.48®o 

94.35®b 

2 

100® 0 

96.14% 

%.11% 

96.14% 

96.64®b 

3 
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Figure?: Results for Dataset 3 
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Table 4: Results for Dataset 4 



Table S: Results for Dataset 6 
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Figure 7: Results for Dataset 4 


Figure 9; Results for Dataset 6 


Table 5: Results for Dataset 5 
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Figure 8: Results for Dataset 5 


7.0 CONCLUSIONS 

We have implemented five different 
imbalanced techniques for OFS assessment 
and validated our methods on driving test 
benchmark datasets. Experimental results 
consistently show that classification 
accuracies for minority classes in the tested 
datasets are improved dramatically with a 
cost of slight performance degradations for 
majority classes, indicating that imbalanced 
learning techniques could be very useful for 
OFS assessment. 

In a practical setting, an OFS assessment 
model will be trained offline. We can utilize 
the imbalanced learning techniques to 
improve recognition accuracies of the 
assessment model for minority OFSs 
without severely decreasing assessment 
effectiveness for majority OFSs. Once the 
model is trained, it will then be able to 
recognize all possible OFSs relatively 
accurately on the fly. This is critical because 
some minority OFSs may be highly 
correlated to aviation safety. 
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Our future work includes further testing the 
applicability of more imbalanced learning 
techniques to the OFS assessment task, 
validating those methods on more subjects’ 
datasets and integrating the most effective 
scheme into a real time OFS assessment 
system. 
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• Classification based on Committee Machine 

• Imbalanced Learning Techniques 
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Sampling methods 


• Random under-sampling [6] 

• Random over-sampling 

• Synthetic minority over-sampling (SMOTE) 

• Borderling-SMOTE 

• Adaptive synthetic sampling (ADASYN) 
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• Randomly select a number of majority data 
samples and remove them. 

• May loss important information. 


Random Under-sampling 
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Random Over-sampling 


• Randomly select minority data samples and 
duplicate them to training data set. 

• May lead to over-fitting. 
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" Synthetic Minority Over- 
sampling Technique (SMOTE) 

• Synthesize a random point on the line between 
the seed and the neighbor 

• May lead to overgeneralization 


Y 





MODSIM WORLD 
Conierervce & Expo 


Borderline-SMOTE 


• Borderline-SMOTE and SMOTE differ in how 
they select seeds. Borderline-SMOTE only 
selects seeds on the borderline. 
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Adaptive Synthetic Sampiing 
(ADASYN) 

• ADASYN and SMOTE defer in the amount of 
new samples that need to be synthesized. The 
more neighbors belong to majority OFSs, the 
more samples need to be synthesized. 
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Distribution of data samples 


45.00% 

40.00% 

35.00% 

30.00% 

25.00% 

20 . 00 % 

15.00% 

10 . 00 % 

5.00% 

0.00% 



1 2 3 4 5 6 7 


■ Data set 1 

■ Data set 2 

■ Data set 3 

■ Data set 4 

■ Data set 5 

■ Data set 6 


Ciass No.(OFS) 


17 


MODSiM WORLD 
Conlerence & Expo 


Achieve baianced data set 



Imbalanced Data Set 


18 


691 







Achieve balanced data set 
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Committee Machine 
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Bootstrapping 


MODSIM WORLD 
Conlerence & Expo 





Samples 


"1 


Sample? 



21 


MODSIM WORLD 
Conierence & Expo 


Feature Selection 
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Piecewise Linear Orthogonal Floating Search (PLOPS) 
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Majority Voting 
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Framework of the OFS 
Assessment Strategy 
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Result for Data Set 1 
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Result for data set 2 


MODSIM WORLD 
Conlerence & Expo 


Accuracy of Classification 
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Result for data set 3 
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Result for data set 4 
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Result for data set 5 
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Result for data set 6 


MODSIM WORLD 
Conlerence & Expo 


Accuracy of Classification 


HtoriNI 


■ OverSample 


I AdaSyn 


7 Overall 


45.00% 
40.00% 
35.00% 
g 30.00% 
? 25.00% 
■| 20 . 00 % 
5 15.00% 
10.00% 
5.00% 
0.00% 


m 


CUm No. (OFS) 


31 


MODSIM WORLD 
Conlerervce & Expo 


Outline 


• Introduction 

• Imbalanced Learning Techniques 

• Experiment Design 

• Results & Discussion 

• Conclusion 


32 


698 


Conclusions 
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• By using imbalanced learning techniques, 
classification accuracies for minority OFSs are 
improved dramatically with a cost of slight 
performance degradations for majority OFSs 

• Different sampling algorithms appear to perform 
similarly 

• Future work 

- Test more imbalanced techniques 

- Validate those techniques on more subjects’ datasets 
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